Popular Hacker News stories analyzed with AlchemyAPI

Hacker News is a social news website focusing on computer science and entrepreneurship. It is run by Paul Graham's investment fund and startup incubator, Y Combinator.

Hacker News is community based. Articles are posted and upvoted by users. The most popular articles make it to the top of the site. The algorithm that computes the top ranked articles is largely impacted by time, giving new articles a significant boost. After all, it's a news site.

Content that can be submitted to Hacker News is defined as "anything that gratifies one's intellectual curiosity". If you haven't seen the site yet, check it out. You can find more details about the Hacker News algorithm in this blog post.

If you're already an avid reader of Hacker News you might have wondered "Is there a topic commonality between the popular Hacker News stories? or put differently if you're an author - "What topic do I have to write about to make it to the top of Hacker News?

This notebook is trying to answer these questions using the Hacker News API and the AlchemyAPI to get ranked concepts for textual data. Here are the steps we'll take:

Query the most popular Hacker News stories
Save the current 500 most popular Hacker News stories
Tag concepts of Hacker News stories using AlchemyAPI

To run this notebook, register for a free AlchemyAPI account. After you receive your API key, paste your API key in the cell below:



In [1]:

    
api_key = 'PASTE_ALCHEMY_API_KEY_HERE'

Import required Python libraries:



In [2]:

    
import requests
import os
import pandas as pd
from datetime import datetime

Query the current 500 most popular Hacker News stories

The Hacker News API provides an end point topstories that returns the 500 most highly rated stories at this time.

Define Hacker News API request url strings



In [3]:

    
hacker_news_api_base_url = 'https://hacker-news.firebaseio.com/v0/'
hacker_news_feature_url_item = 'item/'
hacker_news_feature_url_topstories = 'topstories'
hacker_news_api_parameters = '.json?print=pretty'

Define Hacker News API helper functions



In [4]:

    
def get_story_for_id(story_id):
    ''' Queries the Hacker News API for story information about for the given story_id. '''
    story_request_url = hacker_news_api_base_url + hacker_news_feature_url_item + unicode(story_id) + hacker_news_api_parameters
    story = requests.get(story_request_url).json()
    return story



In [5]:

    
def get_story_details(story):
    ''' Filter relevant story information from the given Hacker News API story object. '''
    # remove descendants from story (e.g. comments), because we don't use them
    if 'kids' in story: del story['kids']
    # encode text field content as ascii (work around IPython defect https://github.com/ipython/ipython/issues/6799)
    if 'title' in story: story['title'] = story['title'].encode('ascii', 'ignore')
    if 'text' in story: story['text'] = story['text'].encode('ascii', 'ignore')
    if 'url' in story: story['url'] = story['url'].encode('ascii', 'ignore')
    return story



In [6]:

    
def get_all_story_details(story_ids):
    ''' Queries Hacker News API for relevant story information for given list of story_ids. '''
    all_story_details = []
    for story_id in story_ids:
        all_story_details.append(get_story_details(get_story_for_id(story_id)))
    return all_story_details

Query stories

With these helper functions in place, let's query the Hacker News API for the current top 500 stories:



In [7]:

    
current_top_500_stories_url = hacker_news_api_base_url + hacker_news_feature_url_topstories + hacker_news_api_parameters
current_top_500_stories = requests.get(current_top_500_stories_url).json()

Take a look at what we got to make sure we have a list of story ids:



In [ ]:

    
current_top_500_stories

Save the current 500 most popular Hacker News stories

The top 500 stories provided by the Hacker News API are a snapshot that reflect the currently most popular stories. To enable an analysis of the most popular stories over time, it is helpful to work with a larger corpus of stories.

Define story persistence helper function

Let's save the story ids on disk for future usage with the Python pickle library provided by Pandas convenience methods read_pickle and to_pickle:



In [9]:

    
story_ids_file_name = 'hacker_news_story_ids.pickle'

def update_saved_story_ids(story_ids, story_ids_file_name):
    ''' Read story ids from disk, merge with given story_ids, and save back to disk. '''
    file_story_ids = []
    try:
        file_story_ids = pd.read_pickle(story_ids_file_name)
    except IOError as err:
        # file for story ids does not yet exist, move on
        pass
    merged_story_ids = set(file_story_ids).union(set(story_ids))
    pd.Series(list(merged_story_ids)).to_pickle(story_ids_file_name)
    return merged_story_ids



In [10]:

    
story_ids_up_until_today = update_saved_story_ids(current_top_500_stories, story_ids_file_name)

Query story details

Query details about all stories (e.g. title, url of linked article, publish time, score, number of descendenants). Let's see for how many stories we're going query details:



In [11]:

    
len(story_ids_up_until_today)









    Out[11]:





1053

Now, query the details and show a sample of the first five stories (If you happen to hit JSON errors, try running the cell again, as these seem to happen intermittently):



In [14]:

    
all_story_details = get_all_story_details(list(story_ids_up_until_today))
# optionally, comment the first line and uncomment the two lines below to use a subset of stories to work with to reduce subsequent API requests against AlchemyAPI
# top_10_stories = list(story_ids_up_until_today)[0:10]
# all_story_details = get_all_story_details(top_10_stories)
stories_df = pd.DataFrame.from_dict(all_story_details)
stories_df.head(5)









    Out[14]:






  
    
      
      by
      dead
      deleted
      descendants
      id
      parts
      score
      text
      time
      title
      type
      url
    
  
  
    
      0
         mparr4
       NaN
       NaN
        0
       9367553
       NaN
         5
       
       1428933057
          Negotiating HTTP/2: ALPN and the TLS Handshake
       story
       http://matthewparrilla.com/post/negotiation-ht...
    
    
      1
          ivank
       NaN
       NaN
       29
       9363458
       NaN
       102
       
       1428849266
                                 The Essence of Peopling
       story
       http://www.ribbonfarm.com/2015/04/08/the-essen...
    
    
      2
          tokai
       NaN
       NaN
        0
       9367556
       NaN
        10
       
       1428933126
                       The Word-Space Model (2006) [pdf]
       story
        https://www.sics.se/~mange/TheWordSpaceModel.pdf
    
    
      3
        spikels
       NaN
       NaN
        6
       9377110
       NaN
        39
       
       1429043471
       SpaceX Rocket lands on droneship, but too hard...
       story
       https://mobile.twitter.com/elonmusk/status/588...
    
    
      4
       gits1225
       NaN
       NaN
        6
       9367564
       NaN
        19
       
       1428933203
       Programming languages shape the way their user...
       story
       http://www.technologyreview.com/review/536356/...

Tag concepts of Hacker News stories using AlchemyAPI

One of the features provided by AlchemyAPI is Concept Tagging. It allows extracting concepts from web-based content available at a given URL. We're going to apply concept tagging to the URLs from the Hacker News stories.

Define AlchemyAPI request URL strings



In [15]:

    
alchemy_api_base_url = 'http://access.alchemyapi.com/calls/url/'
alchemy_api_parameters = '?apikey=' + api_key + '&outputMode=json&url='
alchemy_feature_url_concepts = "URLGetRankedConcepts"

Define AlchemyAPI helper functions

Implement a function to query the AlchemyAPI for concepts for a given url:



In [16]:

    
def get_concepts_for_url(story_url, story_urls_and_concepts):
    ''' Query AlchemyAPI concept tagging for given url and add result to given story_urls_and_concepts dictionary. '''
    if story_url in story_urls_and_concepts:
        # attempt to get concepts for story url from disk 
        concepts = story_urls_and_concepts.get(story_url)
    else:
        # no concepts available on disk for story url, query AlchemyAPI for concepts and add save for future use
        request_url = alchemy_api_base_url + alchemy_feature_url_concepts + alchemy_api_parameters + story_url
        concepts = requests.get(request_url).json()
        story_urls_and_concepts[story_url] = concepts
    return concepts

Let's test the function by running it against a test_url pointing to an article from cnn.com:



In [17]:

    
story_urls_and_concepts = {}
test_url = 'http://www.cnn.com/2009/CRIME/01/13/missing.pilot/index.html'
get_concepts_for_url(test_url, story_urls_and_concepts)









    Out[17]:





{u'concepts': [{u'dbpedia': u'http://dbpedia.org/resource/Marshal',
   u'freebase': u'http://rdf.freebase.com/ns/m.01mz37',
   u'relevance': u'0.979535',
   u'text': u'Marshal'},
  {u'dbpedia': u'http://dbpedia.org/resource/United_States_Marshals_Service',
   u'freebase': u'http://rdf.freebase.com/ns/m.0p6f_',
   u'relevance': u'0.904977',
   u'text': u'United States Marshals Service',
   u'website': u'http://www.usdoj.gov/marshals',
   u'yago': u'http://yago-knowledge.org/resource/United_States_Marshals_Service'},
  {u'dbpedia': u'http://dbpedia.org/resource/Suicide',
   u'freebase': u'http://rdf.freebase.com/ns/m.06z5s',
   u'opencyc': u'http://sw.opencyc.org/concept/Mx4rwQrsYZwpEbGdrcN5Y29ycA',
   u'relevance': u'0.902347',
   u'text': u'Suicide'},
  {u'dbpedia': u'http://dbpedia.org/resource/Sheriff',
   u'freebase': u'http://rdf.freebase.com/ns/m.0mb31',
   u'opencyc': u'http://sw.opencyc.org/concept/Mx4rRWTpvi9AQdicUZy0P3lKVQ',
   u'relevance': u'0.784423',
   u'text': u'Sheriff'},
  {u'dbpedia': u'http://dbpedia.org/resource/Constable',
   u'freebase': u'http://rdf.freebase.com/ns/m.01434f',
   u'relevance': u'0.76973',
   u'text': u'Constable'},
  {u'ciaFactbook': u'http://www4.wiwiss.fu-berlin.de/factbook/resource/United_States',
   u'dbpedia': u'http://dbpedia.org/resource/United_States',
   u'freebase': u'http://rdf.freebase.com/ns/m.09c7w0',
   u'opencyc': u'http://sw.opencyc.org/concept/Mx4rvVikKpwpEbGdrcN5Y29ycA',
   u'relevance': u'0.748903',
   u'text': u'United States',
   u'website': u'http://www.usa.gov/',
   u'yago': u'http://yago-knowledge.org/resource/United_States'},
  {u'dbpedia': u'http://dbpedia.org/resource/Federal_Bureau_of_Investigation',
   u'freebase': u'http://rdf.freebase.com/ns/m.02_1m',
   u'geo': u'38.894465 -77.024503',
   u'opencyc': u'http://sw.opencyc.org/concept/Mx4rvWJE0JwpEbGdrcN5Y29ycA',
   u'relevance': u'0.589856',
   u'text': u'Federal Bureau of Investigation',
   u'website': u'http://www.fbi.gov',
   u'yago': u'http://yago-knowledge.org/resource/Federal_Bureau_of_Investigation'},
  {u'dbpedia': u'http://dbpedia.org/resource/The_Fugitive_(1993_film)',
   u'freebase': u'http://rdf.freebase.com/ns/m.05znxx',
   u'relevance': u'0.586845',
   u'text': u'The Fugitive',
   u'yago': u'http://yago-knowledge.org/resource/The_Fugitive_(1993_film)'}],
 u'language': u'english',
 u'status': u'OK',
 u'url': u'http://www.cnn.com/2009/CRIME/01/13/missing.pilot/index.html',
 u'usage': u'By accessing AlchemyAPI or using information generated by AlchemyAPI, you are agreeing to be bound by the AlchemyAPI Terms of Use: http://www.alchemyapi.com/company/terms.html'}

You should see a JSON document containing list of concepts extracted from the website at the given url. Each concept is identified by the text and is assigned a relevance which measures how confident AlchemyAPI is that the website is talking about this concept. Based on the identified concept, the JSON also contains links to publicly available knowledge bases DBPedia and Yago. Feel free to test AlchemyAPI concept tagging for articles that you're interested in by replacing the test_url.

The free price tier for AlchemyAPI allows 1000 queries per day. To reduce the number of AlchemyAPI query requests, we create a dictionary story_urls_and_concepts to store the story url with the list of detected concepts:



In [26]:

    
story_urls_and_concepts_file_name = 'story_urls_and_concepts.pickle'

try:
    story_urls_and_concepts = pd.read_pickle(story_urls_and_concepts_file_name)
except IOError as err:
    # file for story urls and concepts does not yet exist, move on
    story_urls_and_concepts = {}
    pass

Now that we can extract concepts for articles at a given url we need to extract the concepts for a Hacker News story_id. For each identified concept we need to keep track how often and from what story it was extracted. We need to aggregate popularity measures like score and number of descendants from the stories.

The resulting data structure is a dictionary of dictionaries containing the following information:

{
'Programming language': {
  concept_dict {
    'occurs' : 11    # there are 10 occurrences of the concept 'Programming language' in our stories
    'score'  : 543   # aggregated score of all stories containing concept 'Progamming language' is 543
    'ids'    : [123,456] # story ids of all stories containing concept 'Programming language'
    'descendants' 94  # aggregated number of all descendants of all stories containing 'Progamming language'
    'links' : ['www.cnn.com/programming_language', ... ]  # links to all stories about 'Progamming language'
  }
}
}

The following function aggregates all this information about all concepts extracted from all stories:



In [27]:

    
def get_concepts_for_id(story_id, all_concepts_dicts, story_urls_and_concepts):
    ''' Extracts concepts for given story_id and aggregates story popularity information. '''
    print "Querying concepts for story " + unicode(story_id) + "..."
    request_url = hacker_news_api_base_url + hacker_news_feature_url_item + unicode(story_id) + hacker_news_api_parameters
    print(request_url)
    story = requests.get(request_url).json()
    # ignore "Ask HN" and job posts, only consider actual stories
    if story.get('type') == 'story':
        # make sure story has url that links to article
        if story.get('url') is not None:
            # extract concepts using AlchemyAPI
            concept_result = get_concepts_for_url(story.get('url'), story_urls_and_concepts)
            if concept_result['status'] == 'OK':
                concepts = concept_result.get('concepts')
                for concept in concepts:
                    # check, if we previously encountered the concept in another article
                    concept_dict = {}
                    concept_text = concept.get('text')
                    # ignore concepts with low score
                    if (float(concept.get('relevance')) > 0.6):
                        concept_dict['occurs'] = 1
                        concept_dict['relevance'] = concept.get('relevance')
                        concept_dict['ids'] = [story_id]
                        concept_dict['score'] = story.get('score')
                        concept_dict['descendants'] = story.get('descendants')
                        concept_dict['links'] = [story.get('url')]
                        if concept_text in all_concepts_dicts:
                            # merge additional concept info with already existing concept info
                            # add up the scores and number of descendants by concept
                            already_existing_concept = all_concepts_dicts.get(concept_text)
                            already_existing_concept['occurs'] = already_existing_concept['occurs'] + 1
                            already_existing_concept['score'] = already_existing_concept['score'] + story.get('score')
                            already_existing_concept['descendants'] = already_existing_concept['descendants'] + story.get('descendants')
                            already_existing_concept['links'] = already_existing_concept['links'] + concept_dict['links']
                            already_existing_concept['ids'] = already_existing_concept['ids'] + concept_dict['ids']
                        else:
                            all_concepts_dicts[concept_text] = concept_dict
    return all_concepts_dicts

Let's test the get_concepts_for_id helper function by providing it a valid Hacker News story id:



In [28]:

    
all_concepts_dicts = {}
test_story_id = 9226497
all_concepts_dicts = get_concepts_for_id(test_story_id, all_concepts_dicts, story_urls_and_concepts)
print all_concepts_dicts









    



Querying concepts for story 9226497...
https://hacker-news.firebaseio.com/v0/item/9226497.json?print=pretty
{u'Split Airport': {'links': [u'http://lit.vulf.de/spotify-so-little/'], 'descendants': 295, 'ids': [9226497], 'score': 613, 'relevance': u'0.896593', 'occurs': 1}, u'Pool': {'links': [u'http://lit.vulf.de/spotify-so-little/'], 'descendants': 295, 'ids': [9226497], 'score': 613, 'relevance': u'0.941916', 'occurs': 1}}

You should see a dictionary of concepts with links to stories, score and descendant information. Feel free to enter different test_story_ids.

Now that we know our get_concepts_for_id function works, let's query and aggregate the concepts for all Hacker News stories:



In [29]:

    
len(story_urls_and_concepts)









    Out[29]:





1708



In [30]:

    
len(stories_df)









    Out[30]:





1053



In [ ]:

    
story_counter = 1
for story_id in story_ids_up_until_today:
# optionally, comment the line above and uncomment the line below to limit requests to 10 stories
# for story_id in top_10_stories:
    all_concepts_dicts = get_concepts_for_id(story_id, all_concepts_dicts, story_urls_and_concepts)
    print 'Done. ' + unicode(story_counter) + ' stories queried.'
    story_counter = story_counter + 1

Save the story_urls_and_concepts dictionary to disk for future use. The dictionary got created and updated while iterating through all story_ids and is valueable at this point, because it contains the concepts from querying the AlchemyAPI, which supports a limited number of requests per day. Without saving the story_urls_and_concepts to disk, we would hit that 1000 requests per day limit after just a few days of collecting story ids.



In [33]:

    
import pickle

with open(story_urls_and_concepts_file_name, 'wb') as story_urls_and_concepts_file:
    pickle.dump(story_urls_and_concepts, story_urls_and_concepts_file)

Evaluate the result

Most popular stories by score

Create a DataFrame for the extracted concepts for tabular presentation and sort the concepts by score, showing the most popular topics at the top.



In [34]:

    
all_concepts_df = pd.DataFrame.from_dict(all_concepts_dicts, orient='index')
all_concepts_sorted_by_score_df = all_concepts_df.sort(columns='score', ascending=False)
all_concepts_sorted_by_score_df









    Out[34]:






  
    
      
      links
      descendants
      ids
      score
      relevance
      occurs
    
  
  
    
      Apple Inc.
       [http://roadlesstravelled.me/2015/04/06/why-st...
       1186
       [9342994, 9367868, 9360732, 9361163, 9378434, ...
       2551
        0.93003
       16
    
    
      Operating system
       [https://www.kickstarter.com/projects/13814379...
        808
       [9375914, 9359722, 9363857, 9347669, 9380621, ...
       2333
       0.987744
       25
    
    
      Computer program
       [https://atmospherejs.com/chipcastledotcom/jsp...
        580
       [9367859, 9367913, 9367618, 9380338, 9380339, ...
       2159
       0.780098
       22
    
    
      Java
       [https://github.com/jackm321/RustNN, https://a...
        588
       [9277680, 9367859, 9380190, 9368075, 9368137, ...
       2097
       0.819112
       18
    
    
      Internet
       [http://www.chinafile.com/conversation/new-chi...
        807
       [9367601, 9363565, 9359799, 9372431, 9360421, ...
       1944
        0.94787
       32
    
    
      Programming language
       [http://www.technologyreview.com/review/536356...
        585
       [9367564, 9261073, 9363496, 9380089, 9380165, ...
       1902
       0.987098
       49
    
    
      Steve Jobs
       [http://roadlesstravelled.me/2015/04/06/why-st...
        833
       [9342994, 9363714, 9360732, 9366382, 9362328, ...
       1871
       0.758757
        7
    
    
      Google
       [http://www.rawstory.com/rs/2015/04/5-worst-th...
        928
       [9363714, 9196008, 9380421, 9368418, 6312903, ...
       1618
       0.882041
       26
    
    
      E-mail
       [http://matthewparrilla.com/post/negotiation-h...
        402
       [9367553, 9381017, 9364783, 9374335, 9375525, ...
       1583
       0.628929
       10
    
    
      Threads
       [http://blog.rust-lang.org/2015/04/10/Fearless...
        466
                                      [9355382, 9271246]
       1573
       0.774183
        2
    
    
      JavaScript
       [https://github.com/mozumder/HTML6, https://gi...
        516
       [9368014, 9380339, 9365193, 9373618, 9271246, ...
       1469
       0.969272
       11
    
    
      Open source
       [http://www.openwall.com/lists/oss-security/20...
        335
       [9375884, 9363857, 9351769, 9380621, 9376896, ...
       1458
       0.954089
       21
    
    
      C
       [http://probablyfine.co.uk/2015/04/11/announci...
        533
       [9375767, 9361781, 9277680, 9380089, 9367859, ...
       1428
        0.66933
       31
    
    
      English-language films
       [http://arxiv.org/abs/1504.02179, http://sethg...
        583
       [9367580, 9372086, 9368426, 9364426, 9372918, ...
       1398
       0.742263
       17
    
    
      Mac OS X
       [http://sourceforge.net/projects/elementaryos/...
        481
       [9359722, 9363857, 9347669, 9360803, 9365563, ...
       1293
       0.673395
       10
    
    
      Google Chrome
       [https://github.com/mozumder/HTML6, http://blo...
        332
       [9368014, 9196008, 9367051, 9360588, 9377681, ...
       1288
       0.862783
       14
    
    
      IP address
       [http://morris.guru/huthos-the-totally-100-leg...
        544
           [9363565, 9375886, 9372918, 9374028, 9353785]
       1282
       0.973105
        5
    
    
      IPhone
       [http://roadlesstravelled.me/2015/04/06/why-st...
        564
       [9342994, 9376513, 9368859, 9361163, 9325611, ...
       1270
       0.610224
        6
    
    
      Abuse
       [http://roadlesstravelled.me/2015/04/06/why-st...
        543
                                               [9342994]
       1212
       0.786606
        1
    
    
      Cupertino, California
       [http://roadlesstravelled.me/2015/04/06/why-st...
        543
                                               [9342994]
       1212
        0.61414
        1
    
    
      Computer
       [https://www.kickstarter.com/projects/13814379...
        501
       [9375914, 9372066, 9368255, 9368294, 9368443, ...
       1190
       0.726815
       20
    
    
      File system
       [http://getgrav.org/, http://jjacky.com/anopa/...
        439
       [8175797, 9359598, 9351542, 9368353, 9368475, ...
       1181
       0.928131
       12
    
    
      Free software
       [https://github.com/blog/1986-announcing-git-l...
        345
       [9343021, 9380338, 9380621, 9376896, 9373671, ...
       1161
       0.606101
       13
    
    
      Linux
       [http://strong-pm.io/, http://www.phoronix.com...
        245
       [9380338, 9360321, 9372845, 9377066, 9360803, ...
       1153
        0.62069
       21
    
    
      Unix
       [http://www.phoronix.com/scan.php?page=news_it...
        300
       [9360321, 9364658, 9365317, 9354246, 9350206, ...
       1055
        0.61581
        7
    
    
      Computing platform
              [https://github.com/facebook/react-native]
        287
                                               [9271246]
       1039
        0.60849
        1
    
    
      Cryptography
       [http://www.washingtonpost.com/world/national%...
        597
       [9360553, 9377080, 9365026, 9365054, 9365302, ...
       1036
       0.733362
       13
    
    
      World Wide Web
       [https://www.dareboost.com, https://medium.com...
        509
       [9375815, 9376360, 9380468, 9360421, 9381305, ...
       1022
         0.9044
       16
    
    
      Python
       [http://it-ebooks.info/, http://squirrel-lang....
        403
       [9377853, 9368075, 9372303, 9368475, 9360421, ...
       1008
       0.648592
       17
    
    
      2004 albums
            [http://codepen.io/jakealbaugh/full/PwLXXP/]
         77
                                               [9317159]
        962
       0.834497
        1
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      Economic democracy
       [https://www.jacobinmag.com/2015/04/uber-explo...
          0
                                               [9362077]
          3
       0.781777
        1
    
    
      H. G. Wells
       [http://blog.macsales.com/29795-owc-tear-down-...
          1
                                               [9377621]
          3
       0.714366
        1
    
    
      Personal health record
       [http://blogs.wsj.com/venturecapital/2015/04/1...
          0
                                               [9368859]
          3
       0.911095
        1
    
    
      United States presidential election, 2008
       [http://www.msn.com/en-us/news/politics/obama-...
          2
                                               [9371314]
          3
       0.740318
        1
    
    
      Dustin Moskovitz
       [https://www.facebook.com/zuck/posts/101020281...
          2
                                               [9377842]
          3
       0.824491
        1
    
    
      Business terms
       [http://uk.businessinsider.com/spotify-raising...
          1
                                               [9366515]
          3
       0.649363
        1
    
    
      Laborer
       [https://www.jacobinmag.com/2015/04/uber-explo...
          0
                                               [9362077]
          3
       0.855691
        1
    
    
      Circuit
                   [http://gocircuit.github.io/circuit/]
          0
                                               [9268796]
          3
          0.624
        1
    
    
      Depression
       [http://www.feld.com/archives/2015/04/bringing...
          0
                                               [9378446]
          3
       0.736492
        1
    
    
      Solar cell
       [http://www.bloomberg.com/news/articles/2015-0...
          0
                                               [9381209]
          3
       0.886276
        1
    
    
      Time travel
       [http://blog.macsales.com/29795-owc-tear-down-...
          1
                                               [9377621]
          3
       0.751752
        1
    
    
      Product management
       [https://medium.com/galleys/the-minimum-viable...
          1
                                               [9367414]
          3
       0.710645
        1
    
    
      Mutualism
       [https://www.jacobinmag.com/2015/04/uber-explo...
          0
                                               [9362077]
          3
       0.636801
        1
    
    
      Television
       [https://medium.com/@ev/sometimes-things-stay-...
          0
                                               [9370533]
          3
        0.72558
        1
    
    
      Grammatical person
       [https://motivatedgrammar.wordpress.com/2009/0...
          0
                                               [9372405]
          3
       0.824975
        1
    
    
      JQuery
       [http://noeticforce.com/best-Javascript-framew...
          0
                                               [9362861]
          3
       0.632081
        1
    
    
      Health informatics
       [http://blogs.wsj.com/venturecapital/2015/04/1...
          0
                                               [9368859]
          3
        0.80838
        1
    
    
      Ignaz Pleyel
       [https://www.kickstarter.com/projects/opengold...
          0
                                               [9378635]
          3
       0.621621
        1
    
    
      Direct memory access
       [https://www.parallella.org/2015/04/05/paralle...
          0
                                               [9378023]
          3
       0.989697
        1
    
    
      Special Activities Division
       [http://www.nytimes.com/2015/04/14/world/middl...
          0
                                               [9370503]
          3
       0.679562
        1
    
    
      Technological singularity
       [http://conversableeconomist.blogspot.com/2015...
          1
                                               [9374784]
          3
       0.642579
        1
    
    
      Present
       [https://medium.com/@ev/sometimes-things-stay-...
          0
                                               [9370533]
          3
       0.692987
        1
    
    
      Coaxial cable
       [https://medium.com/@ev/sometimes-things-stay-...
          0
                                               [9370533]
          3
       0.663549
        1
    
    
      Libertarian socialism
       [https://www.jacobinmag.com/2015/04/uber-explo...
          0
                                               [9362077]
          3
       0.644318
        1
    
    
      Closed-circuit television
       [http://www.theverge.com/2015/4/7/8355123/flir...
          0
                                               [9362296]
          3
        0.86142
        1
    
    
      Serfdom
       [https://www.jacobinmag.com/2015/04/uber-explo...
          0
                                               [9362077]
          3
       0.860459
        1
    
    
      Bo Diddley
       [http://www.theverge.com/2015/4/7/8355123/flir...
          0
                                               [9362296]
          3
       0.727634
        1
    
    
      Visual effects
       [http://motherboard.vice.com/read/hollywoods-p...
          0
                                               [9374114]
          3
       0.781674
        1
    
    
      Bo Jackson
       [http://www.theverge.com/2015/4/7/8355123/flir...
          0
                                               [9362296]
          3
       0.676713
        1
    
    
      Instability
       [https://medium.com/@alan.imgur/startup-growth...
          0
                                               [9374265]
          3
       0.709443
        1
    
  

2344 rows × 6 columns

Most discussed stories by number of comments

Let's see which topics are most discussed and result in the highest number of comments (story descendants):



In [35]:

    
all_concepts_sorted_by_descendants_df = all_concepts_df.sort(columns='descendants', ascending=False)
all_concepts_sorted_by_descendants_df









    Out[35]:






  
    
      
      links
      descendants
      ids
      score
      relevance
      occurs
    
  
  
    
      Apple Inc.
       [http://roadlesstravelled.me/2015/04/06/why-st...
       1186
       [9342994, 9367868, 9360732, 9361163, 9378434, ...
       2551
        0.93003
       16
    
    
      Google
       [http://www.rawstory.com/rs/2015/04/5-worst-th...
        928
       [9363714, 9196008, 9380421, 9368418, 6312903, ...
       1618
       0.882041
       26
    
    
      Steve Jobs
       [http://roadlesstravelled.me/2015/04/06/why-st...
        833
       [9342994, 9363714, 9360732, 9366382, 9362328, ...
       1871
       0.758757
        7
    
    
      Operating system
       [https://www.kickstarter.com/projects/13814379...
        808
       [9375914, 9359722, 9363857, 9347669, 9380621, ...
       2333
       0.987744
       25
    
    
      Internet
       [http://www.chinafile.com/conversation/new-chi...
        807
       [9367601, 9363565, 9359799, 9372431, 9360421, ...
       1944
        0.94787
       32
    
    
      Cryptography
       [http://www.washingtonpost.com/world/national%...
        597
       [9360553, 9377080, 9365026, 9365054, 9365302, ...
       1036
       0.733362
       13
    
    
      Java
       [https://github.com/jackm321/RustNN, https://a...
        588
       [9277680, 9367859, 9380190, 9368075, 9368137, ...
       2097
       0.819112
       18
    
    
      Programming language
       [http://www.technologyreview.com/review/536356...
        585
       [9367564, 9261073, 9363496, 9380089, 9380165, ...
       1902
       0.987098
       49
    
    
      English-language films
       [http://arxiv.org/abs/1504.02179, http://sethg...
        583
       [9367580, 9372086, 9368426, 9364426, 9372918, ...
       1398
       0.742263
       17
    
    
      Computer program
       [https://atmospherejs.com/chipcastledotcom/jsp...
        580
       [9367859, 9367913, 9367618, 9380338, 9380339, ...
       2159
       0.780098
       22
    
    
      IPhone
       [http://roadlesstravelled.me/2015/04/06/why-st...
        564
       [9342994, 9376513, 9368859, 9361163, 9325611, ...
       1270
       0.610224
        6
    
    
      IP address
       [http://morris.guru/huthos-the-totally-100-leg...
        544
           [9363565, 9375886, 9372918, 9374028, 9353785]
       1282
       0.973105
        5
    
    
      Cupertino, California
       [http://roadlesstravelled.me/2015/04/06/why-st...
        543
                                               [9342994]
       1212
        0.61414
        1
    
    
      Abuse
       [http://roadlesstravelled.me/2015/04/06/why-st...
        543
                                               [9342994]
       1212
       0.786606
        1
    
    
      C
       [http://probablyfine.co.uk/2015/04/11/announci...
        533
       [9375767, 9361781, 9277680, 9380089, 9367859, ...
       1428
        0.66933
       31
    
    
      JavaScript
       [https://github.com/mozumder/HTML6, https://gi...
        516
       [9368014, 9380339, 9365193, 9373618, 9271246, ...
       1469
       0.969272
       11
    
    
      World Wide Web
       [https://www.dareboost.com, https://medium.com...
        509
       [9375815, 9376360, 9380468, 9360421, 9381305, ...
       1022
         0.9044
       16
    
    
      Computer
       [https://www.kickstarter.com/projects/13814379...
        501
       [9375914, 9372066, 9368255, 9368294, 9368443, ...
       1190
       0.726815
       20
    
    
      Mac OS X
       [http://sourceforge.net/projects/elementaryos/...
        481
       [9359722, 9363857, 9347669, 9360803, 9365563, ...
       1293
       0.673395
       10
    
    
      Threads
       [http://blog.rust-lang.org/2015/04/10/Fearless...
        466
                                      [9355382, 9271246]
       1573
       0.774183
        2
    
    
      Want
       [http://blog.rokkincat.com/three-types-of-harm...
        461
       [9375810, 9375842, 9367687, 9364389, 9369038, ...
        873
       0.746904
       20
    
    
      File system
       [http://getgrav.org/, http://jjacky.com/anopa/...
        439
       [8175797, 9359598, 9351542, 9368353, 9368475, ...
       1181
       0.928131
       12
    
    
      App Store
       [https://blog.branch.io/in-deeplinking-context...
        420
       [9359791, 9380558, 9372708, 9376895, 9368859, ...
        846
       0.743945
       10
    
    
      Computer programming
       [https://www.youtube.com/watch?v=3CwJ0MH-4MA, ...
        417
       [9380089, 9315503, 9369547, 9361580, 9365980, ...
        620
       0.630686
        6
    
    
      Telecommuting
       [https://hbr.org/2014/01/to-raise-productivity...
        411
                                      [9367123, 9346726]
        844
       0.982637
        2
    
    
      Amazon.com
       [http://citizapp.com, http://www.cgpgrey.com/b...
        404
                    [9360562, 9365198, 9365710, 9349501]
        802
       0.727798
        4
    
    
      Python
       [http://it-ebooks.info/, http://squirrel-lang....
        403
       [9377853, 9368075, 9372303, 9368475, 9360421, ...
       1008
       0.648592
       17
    
    
      E-mail
       [http://matthewparrilla.com/post/negotiation-h...
        402
       [9367553, 9381017, 9364783, 9374335, 9375525, ...
       1583
       0.628929
       10
    
    
      Time zone
       [http://infiniteundo.com/post/25326999628/fals...
        401
                                      [4128208, 9346726]
        670
        0.64268
        2
    
    
      Time
       [https://mobile.twitter.com/elonmusk/status/58...
        392
       [9377110, 9372092, 9360732, 9361495, 9366122, ...
        785
       0.601115
       14
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      Inline expansion
       [http://www.ocamlpro.com/blog/2015/04/13/ocp-m...
          0
                                               [9367913]
          3
       0.690379
        1
    
    
      Quantum mechanics
       [http://ham.so/2015/04/14/simulating-a-spring-...
          0
                                               [9378504]
          8
       0.943504
        1
    
    
      Decision tree
          [http://brandonharris.io/kaggle-bike-sharing/]
          0
                                               [9363678]
          4
       0.907974
        1
    
    
      Decision tree learning
          [http://brandonharris.io/kaggle-bike-sharing/]
          0
                                               [9363678]
          4
       0.924328
        1
    
    
      Deep linking
       [https://blog.branch.io/in-deeplinking-context...
          0
                                               [9359791]
          3
       0.689424
        1
    
    
      Default
       [https://scotch.io/tutorials/a-visual-guide-to...
          0
                                               [9375696]
          5
        0.87576
        1
    
    
      Pima County, Arizona
       [http://www.theguardian.com/us-news/2015/apr/1...
          0
                                               [9380030]
          6
       0.963105
        1
    
    
      Pinaceae
            [http://thewalrus.ca/wood-is-the-new-steel/]
          0
                                               [9371750]
         21
       0.761346
        1
    
    
      Pinophyta
            [http://thewalrus.ca/wood-is-the-new-steel/]
          0
                                               [9371750]
         21
       0.968128
        1
    
    
      Direct memory access
       [https://www.parallella.org/2015/04/05/paralle...
          0
                                               [9378023]
          3
       0.989697
        1
    
    
      Placebo
       [http://www.bidmc.org/News/In-Research/2015/Ap...
          0
                                               [9374527]
          8
       0.666572
        1
    
    
      Plan 9 from Bell Labs
                     [http://spinroot.com/pico/pjw.html]
          0
                                               [9372050]
          7
       0.764789
        1
    
    
      Dimension
       [https://cs.stanford.edu/people/karpathy/tsnejs/]
          0
                                               [9368255]
         10
       0.892484
        1
    
    
      Playing card
       [https://www.quantamagazine.org/20150414-for-p...
          0
                                               [9375779]
          6
       0.981832
        1
    
    
      Differential geometry
       [https://www.quantamagazine.org/20150408-a-gra...
          0
                                               [9367819]
          7
        0.89041
        1
    
    
      Difference
       [https://cs.stanford.edu/people/karpathy/tsnejs/]
          0
                                               [9368255]
         10
       0.828773
        1
    
    
      Developing country
       [https://www.kickstarter.com/projects/13814379...
          0
                                               [9375914]
         15
       0.718201
        1
    
    
      Design pattern
                                [http://codeconnect.io/]
          0
                                               [8628334]
          4
       0.677821
        1
    
    
      Depression
       [http://www.feld.com/archives/2015/04/bringing...
          0
                                               [9378446]
          3
       0.736492
        1
    
    
      Depend
       [http://blog.guillermowinkler.com/blog/2015/04...
          0
                                               [9365640]
          6
        0.72435
        1
    
    
      Density
       [http://datagenetics.com/blog/april32015/index...
          0
                                               [9380492]
          6
       0.707671
        1
    
    
      Pregnancy
       [http://www.theatlantic.com/health/archive/201...
          0
                                               [9367312]
         10
       0.952931
        1
    
    
      Pregnancy test
       [http://www.theatlantic.com/health/archive/201...
          0
                                               [9367312]
         10
       0.831481
        1
    
    
      Present
       [https://medium.com/@ev/sometimes-things-stay-...
          0
                                               [9370533]
          3
       0.692987
        1
    
    
      President of the European Commission
                      [http://www.cnbc.com/id/102573773]
          0
                                               [9377030]
          9
       0.775637
        1
    
    
      Democratic Republic of the Congo
       [http://spectrum.ieee.org/automaton/robotics/h...
          0
                                               [9369752]
          8
        0.79672
        1
    
    
      Default judgment
       [https://scotch.io/tutorials/a-visual-guide-to...
          0
                                               [9375696]
          5
         0.8364
        1
    
    
      Lightning
       [http://www.blitzortung.org/Webpages/index.php...
          0
                                               [9370346]
          6
       0.848056
        1
    
    
      Chatty Cathy
       [http://www.hopesandfears.com/hopes/now/experi...
        NaN
                                               [9380554]
         24
       0.833185
        1
    
    
      Feel Good Inc.
       [http://www.hopesandfears.com/hopes/now/experi...
        NaN
                                               [9380554]
         24
       0.716204
        1
    
  

2344 rows × 6 columns

You can be the judge whether these are the topics you would have expected to be on top. As you run this notebook over time, more stories will be available. You can also run this Hacker News and AlchemyAPI.ipynb notebook recurringly every day by running the notebook Hacker News Runner.ipynb. This will aggregate data over time and allow for more detailed analysis.

In this notebook we showed the usage of the Hacker News API and the AlchemyAPI. We used the AlchemyAPI concept tagging feature to extract topics from the Hacker News stories. Finally, we sorted aggregated popularity information about the stories for each concept and showed it in a tabular form sorted by different popularity measures.

The invocation of the AlchemyAPI was rather simple. A lot of the code to write intermediate result to disk is to work around the 1000 requests per day limitation.

Concept detection is only one feature of the AlchemyAPI. Check out more features in the API documentation.

This notebook was created using IBM Knowledge Anyhow Workbench. To learn more, visit us at https://knowledgeanyhow.org.

	by	dead	deleted	descendants	id	parts	score	time	title	type	url
0	mparr4	NaN	NaN	0	9367553	NaN	5	1428933057	Negotiating HTTP/2: ALPN and the TLS Handshake	story	http://matthewparrilla.com/post/negotiation-ht...
1	ivank	NaN	NaN	29	9363458	NaN	102	1428849266	The Essence of Peopling	story	http://www.ribbonfarm.com/2015/04/08/the-essen...
2	tokai	NaN	NaN	0	9367556	NaN	10	1428933126	The Word-Space Model (2006) [pdf]	story	https://www.sics.se/~mange/TheWordSpaceModel.pdf
3	spikels	NaN	NaN	6	9377110	NaN	39	1429043471	SpaceX Rocket lands on droneship, but too hard...	story	https://mobile.twitter.com/elonmusk/status/588...
4	gits1225	NaN	NaN	6	9367564	NaN	19	1428933203	Programming languages shape the way their user...	story	http://www.technologyreview.com/review/536356/...

	links	descendants	ids	score	relevance	occurs
Apple Inc.	[http://roadlesstravelled.me/2015/04/06/why-st...	1186	[9342994, 9367868, 9360732, 9361163, 9378434, ...	2551	0.93003	16
Operating system	[https://www.kickstarter.com/projects/13814379...	808	[9375914, 9359722, 9363857, 9347669, 9380621, ...	2333	0.987744	25
Computer program	[https://atmospherejs.com/chipcastledotcom/jsp...	580	[9367859, 9367913, 9367618, 9380338, 9380339, ...	2159	0.780098	22
Java	[https://github.com/jackm321/RustNN, https://a...	588	[9277680, 9367859, 9380190, 9368075, 9368137, ...	2097	0.819112	18
Internet	[http://www.chinafile.com/conversation/new-chi...	807	[9367601, 9363565, 9359799, 9372431, 9360421, ...	1944	0.94787	32
Programming language	[http://www.technologyreview.com/review/536356...	585	[9367564, 9261073, 9363496, 9380089, 9380165, ...	1902	0.987098	49
Steve Jobs	[http://roadlesstravelled.me/2015/04/06/why-st...	833	[9342994, 9363714, 9360732, 9366382, 9362328, ...	1871	0.758757	7
Google	[http://www.rawstory.com/rs/2015/04/5-worst-th...	928	[9363714, 9196008, 9380421, 9368418, 6312903, ...	1618	0.882041	26
E-mail	[http://matthewparrilla.com/post/negotiation-h...	402	[9367553, 9381017, 9364783, 9374335, 9375525, ...	1583	0.628929	10
Threads	[http://blog.rust-lang.org/2015/04/10/Fearless...	466	[9355382, 9271246]	1573	0.774183	2
JavaScript	[https://github.com/mozumder/HTML6, https://gi...	516	[9368014, 9380339, 9365193, 9373618, 9271246, ...	1469	0.969272	11
Open source	[http://www.openwall.com/lists/oss-security/20...	335	[9375884, 9363857, 9351769, 9380621, 9376896, ...	1458	0.954089	21
C	[http://probablyfine.co.uk/2015/04/11/announci...	533	[9375767, 9361781, 9277680, 9380089, 9367859, ...	1428	0.66933	31
English-language films	[http://arxiv.org/abs/1504.02179, http://sethg...	583	[9367580, 9372086, 9368426, 9364426, 9372918, ...	1398	0.742263	17
Mac OS X	[http://sourceforge.net/projects/elementaryos/...	481	[9359722, 9363857, 9347669, 9360803, 9365563, ...	1293	0.673395	10
Google Chrome	[https://github.com/mozumder/HTML6, http://blo...	332	[9368014, 9196008, 9367051, 9360588, 9377681, ...	1288	0.862783	14
IP address	[http://morris.guru/huthos-the-totally-100-leg...	544	[9363565, 9375886, 9372918, 9374028, 9353785]	1282	0.973105	5
IPhone	[http://roadlesstravelled.me/2015/04/06/why-st...	564	[9342994, 9376513, 9368859, 9361163, 9325611, ...	1270	0.610224	6
Abuse	[http://roadlesstravelled.me/2015/04/06/why-st...	543	[9342994]	1212	0.786606	1
Cupertino, California	[http://roadlesstravelled.me/2015/04/06/why-st...	543	[9342994]	1212	0.61414	1
Computer	[https://www.kickstarter.com/projects/13814379...	501	[9375914, 9372066, 9368255, 9368294, 9368443, ...	1190	0.726815	20
File system	[http://getgrav.org/, http://jjacky.com/anopa/...	439	[8175797, 9359598, 9351542, 9368353, 9368475, ...	1181	0.928131	12
Free software	[https://github.com/blog/1986-announcing-git-l...	345	[9343021, 9380338, 9380621, 9376896, 9373671, ...	1161	0.606101	13
Linux	[http://strong-pm.io/, http://www.phoronix.com...	245	[9380338, 9360321, 9372845, 9377066, 9360803, ...	1153	0.62069	21
Unix	[http://www.phoronix.com/scan.php?page=news_it...	300	[9360321, 9364658, 9365317, 9354246, 9350206, ...	1055	0.61581	7
Computing platform	[https://github.com/facebook/react-native]	287	[9271246]	1039	0.60849	1
Cryptography	[http://www.washingtonpost.com/world/national%...	597	[9360553, 9377080, 9365026, 9365054, 9365302, ...	1036	0.733362	13
World Wide Web	[https://www.dareboost.com, https://medium.com...	509	[9375815, 9376360, 9380468, 9360421, 9381305, ...	1022	0.9044	16
Python	[http://it-ebooks.info/, http://squirrel-lang....	403	[9377853, 9368075, 9372303, 9368475, 9360421, ...	1008	0.648592	17
2004 albums	[http://codepen.io/jakealbaugh/full/PwLXXP/]	77	[9317159]	962	0.834497	1
...	...	...	...	...	...	...
Economic democracy	[https://www.jacobinmag.com/2015/04/uber-explo...	0	[9362077]	3	0.781777	1
H. G. Wells	[http://blog.macsales.com/29795-owc-tear-down-...	1	[9377621]	3	0.714366	1
Personal health record	[http://blogs.wsj.com/venturecapital/2015/04/1...	0	[9368859]	3	0.911095	1
United States presidential election, 2008	[http://www.msn.com/en-us/news/politics/obama-...	2	[9371314]	3	0.740318	1
Dustin Moskovitz	[https://www.facebook.com/zuck/posts/101020281...	2	[9377842]	3	0.824491	1
Business terms	[http://uk.businessinsider.com/spotify-raising...	1	[9366515]	3	0.649363	1
Laborer	[https://www.jacobinmag.com/2015/04/uber-explo...	0	[9362077]	3	0.855691	1
Circuit	[http://gocircuit.github.io/circuit/]	0	[9268796]	3	0.624	1
Depression	[http://www.feld.com/archives/2015/04/bringing...	0	[9378446]	3	0.736492	1
Solar cell	[http://www.bloomberg.com/news/articles/2015-0...	0	[9381209]	3	0.886276	1
Time travel	[http://blog.macsales.com/29795-owc-tear-down-...	1	[9377621]	3	0.751752	1
Product management	[https://medium.com/galleys/the-minimum-viable...	1	[9367414]	3	0.710645	1
Mutualism	[https://www.jacobinmag.com/2015/04/uber-explo...	0	[9362077]	3	0.636801	1
Television	[https://medium.com/@ev/sometimes-things-stay-...	0	[9370533]	3	0.72558	1
Grammatical person	[https://motivatedgrammar.wordpress.com/2009/0...	0	[9372405]	3	0.824975	1
JQuery	[http://noeticforce.com/best-Javascript-framew...	0	[9362861]	3	0.632081	1
Health informatics	[http://blogs.wsj.com/venturecapital/2015/04/1...	0	[9368859]	3	0.80838	1
Ignaz Pleyel	[https://www.kickstarter.com/projects/opengold...	0	[9378635]	3	0.621621	1
Direct memory access	[https://www.parallella.org/2015/04/05/paralle...	0	[9378023]	3	0.989697	1
Special Activities Division	[http://www.nytimes.com/2015/04/14/world/middl...	0	[9370503]	3	0.679562	1
Technological singularity	[http://conversableeconomist.blogspot.com/2015...	1	[9374784]	3	0.642579	1
Present	[https://medium.com/@ev/sometimes-things-stay-...	0	[9370533]	3	0.692987	1
Coaxial cable	[https://medium.com/@ev/sometimes-things-stay-...	0	[9370533]	3	0.663549	1
Libertarian socialism	[https://www.jacobinmag.com/2015/04/uber-explo...	0	[9362077]	3	0.644318	1
Closed-circuit television	[http://www.theverge.com/2015/4/7/8355123/flir...	0	[9362296]	3	0.86142	1
Serfdom	[https://www.jacobinmag.com/2015/04/uber-explo...	0	[9362077]	3	0.860459	1
Bo Diddley	[http://www.theverge.com/2015/4/7/8355123/flir...	0	[9362296]	3	0.727634	1
Visual effects	[http://motherboard.vice.com/read/hollywoods-p...	0	[9374114]	3	0.781674	1
Bo Jackson	[http://www.theverge.com/2015/4/7/8355123/flir...	0	[9362296]	3	0.676713	1
Instability	[https://medium.com/@alan.imgur/startup-growth...	0	[9374265]	3	0.709443	1